Activity Modelling using Email and Web Page Classification

نویسندگان

  • Belinda Richards
  • Judy Kay
  • Aaron Quigley
چکیده

This work explores the modelling of a user’s current activity using a single document and a very small collection of classified documents. A model called WeMAC was developed, which accretes evidence from heterogeneous sources to give a final classification of the user’s activity. The evidence sources considered here are the different attributes of a document. We evaluate the WeMAC model using two different document types: emails and web pages; assess its performance on both tiny document sets and larger sets; and assess its performance against a “one bag” approach. The WeMAC model compared well with existing results for systems with similar tasks and larger datasets, with an average F1 value of 0.5-0.7.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification

In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...

متن کامل

Multi-View Learning for Web Spam Detection

Spam pages are designed to maliciously appear among the top search results by excessive usage of popular terms. Therefore, spam pages should be removed using an effective and efficient spam detection system. Previous methods for web spam classification used several features from various information sources (page contents, web graph, access logs, etc.) to detect web spam. In this paper, we follo...

متن کامل

Data Extraction using Content-Based Handles

In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...

متن کامل

Homogenization of interlocking masonry structures using a generalized differential expansion technique

Homogenization of interlocking masonry structures using a generalized differential expansion technique I. Stefanou, J. Sulem and I. Vardoulakis a Department of Applied Mathematics and Physics National Technical University of Athens Zografou Campus, Greece e-mail: [email protected], web page: http://geolab.mechan.ntua.gr/ (Corresponding author) b UR Navier, CERMES, Ecole des Ponts ParisTe...

متن کامل

Resources classification using fractal modelling in Eastern Kahang Cu-Mo porphyry deposit, Central Iran

Resources/reserves classification is crucial for block model creation utilised in mine planning and feasibility study. Selection of estimation methods is an essential part of mineral exploration and mining activities. In other word, resources classification is an issue for mining companies, investors, financial institutions and authorities, but it remains subject to some confusion. The aim of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005